Submit a single zip-compressed file that has the name: YourLastName_Assignment_5 that has the following files:
Create a folder or directory with all supplementary files with your last name at the beginning of the folder name, compress that folder with zip compression, and post the zip-archived folder under the assignment link in Canvas. The following files should be included in an archive folder/directory that is uploaded as a single zip-compressed file. (Use zip, not StuffIt or any 7z or any other compression method.)
Formatting Python Code When programming in Python, refer to Kenneth Reitz’ PEP 8: The Style Guide for Python Code: http://pep8.org/ (Links to an external site.)Links to an external site. There is the Google style guide for Python at https://google.github.io/styleguide/pyguide.html (Links to an external site.)Links to an external site. Comment often and in detail.
In his first state of the union address , president Trump mentioned Chicago violance 10 times Trump's State of the Union Address
Columnist Clarence Page wrote an article , published by the Chicago Tribune stated that the city of Chicago had more homicides in the past two years than New York and Los Angeles combined
The CSV file for crimes dataset for the city of Chicago is obtained from the data portal for the city of Chicago. Here is the link for the city of Chicago data portal City of Chicago Data Portal
Three datasets are need for this assignment:
Lets load the CSV file into a DataFrame object and see the nature of the data that we have.
Complete description of the dataset can be found on Chicago city data portal.
Based on Trumps State of the Union Address and the article written by columnist Clarence Page and published by the Chicago Tribune, we are interested to retrieve the data for the past two years and perform different types of spatial queries.
There are few of these queries that we are interested in to help CPD and city of Chicago to plot on a Choroplteh map those districts that have highest gun crimes.
Here are examples of those types of queries:
Packages you need to Connect PostgreSQL server to load and retrieve Crhicago Crime dataset from the database:
Execute the pip install command from the command window to install the packages listed bove
Execute the following pip show command from the command line to get info about any package you installed. Make sure that the packages got installed into Anaconda library since we are using Anaconda tool; please see below the pakcage info
Since we are using PostGIS in our work, please read and bookmark Chapter 4. Using PostGIS: Data Management and Queries
import folium
from folium import plugins
from folium.plugins import MarkerCluster
import psycopg2
import csv
import pandas as pd
import json
from area import area
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
# To run the script on the complete dataset takes roughly 35 minutes to omplete.
# Use this data set for your final submission of your Assignment 3
# Uncomment the following line after you unit test your code and ready to run and submit your assignment on this dataset
# db_connection = psycopg2.connect(host='129.105.208.229',dbname="chicago_crimes", user="YourNetID" , password="YourPassword")
# Use the following dataset for unit testing purposes only. It takes roughly 5 minutes to omplete.
# Comment the following line when you are done with your unit testing and ready to run your assignment on the complete dataset and submit your Assignment
db_connection = psycopg2.connect(host='129.105.208.229',dbname="chicago_crimes_ut", user="pmw9564" , password="")
cursor = db_connection.cursor()
The Crimes_2001_to_present.csv is downloaded from Chicago data portal and it has roughly 6.5 million records.
While working on this dataset, It is prudent to make a note of the following:
Sort algorithms used by the database engines vary in performance between O($N log N$) and O($ N^{2} $) where $N$ is the size of the number
Search algorithms used by the database engines vary in performance between O($log N$) and O($ N $) where $N$ is the size of the number
cursor.execute("SELECT district, count(district) from crimes GROUP BY district")
rows=cursor.fetchall()
crimes_per_district = pd.DataFrame(rows, columns=['dist_num','number_of_crimes'])
crimes_per_district['dist_num'] = crimes_per_district['dist_num'].astype(str)
crimes_per_district.head()
crimes_dataset = pd.DataFrame(rows)
crimes_dataset.head()
total_number_of_crimes_per_district_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
total_number_of_crimes_per_district_map.choropleth(geo_data="Boundaries.geojson",
fill_color='OrRd',
fill_opacity=0.5,
line_opacity=1,
data = crimes_per_district,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'number_of_crimes']
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1])
cursor.execute("SELECT district, count(district) from crimes where district= %s GROUP BY district",[police_station[2]])
districts_crime_numbers = cursor.fetchall()
for district in districts_crime_numbers:
folium.Marker(location = police_station_location,popup=folium.Popup(html="District No : %s has Total Number of Crimes:%s" %district ,max_width=450)).add_to(total_number_of_crimes_per_district_map)
total_number_of_crimes_per_district_map
Well, we really need only the violent crimes per district, so we will filter only those crimes that we are interested in. Please note that we are not interested to plot property crimes, we are really after violent crimes and in particular Gun related crimes.
So for now, lets plot violent crimes on Choropleth map and later on we will filter only Gun related crimes
violent_crime_categories='THEFT','ASSAULT','ROBBERY','KIDNAPPING','CRIM SEXUAL ASSAULT','BATTERY','MURDER'
cursor.execute("SELECT district, count(district) from crimes where PRIMARY_TYPE in %s GROUP BY district",[violent_crime_categories])
rows=cursor.fetchall()
violent_crime_data=pd.DataFrame(rows, columns=['district_num','number_of_violent_crimes'])
violent_crime_data['district_num'] = violent_crime_data['district_num'].astype(str)
violent_crime_data
violent_crimes_per_district_map= folium.Map(location =(41.8781, -87.6298),zoom_start=11)
violent_crimes_per_district_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = violent_crime_data,
key_on='feature.properties.dist_num',
columns = ['district_num', 'number_of_violent_crimes'],
legend_name="VOILENT CRIME MAP"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
for police_station in police_stations:
police_station_location =(police_station[0],police_station[1])
cursor.execute("SELECT PRIMARY_TYPE, count(PRIMARY_TYPE) from crimes where district =%s AND PRIMARY_TYPE in %s GROUP BY PRIMARY_TYPE",[police_station[2],violent_crime_categories])
data = cursor.fetchall()
violent_crimes_per_district_df = pd.DataFrame(data, columns=['Description', 'Number of Violent Crimes'])
header = violent_crimes_per_district_df.to_html(classes='table table-striped table-hover table-condensed table-responsive')
folium.Marker(location=police_station_location, popup=folium.Popup(html="District Number %s - Violent Crimes %s" %(police_station[2],header))).add_to(violent_crimes_per_district_map)
violent_crimes_per_district_map
Lets first create a dataframe of gun crimes per district first to get an idea about the number of gun crimes per district
gun='%GUN%'
cursor.execute("SELECT district, count(district) from crimes where DESCRIPTION::text LIKE %s GROUP BY district",[gun])
districts_gun_violent_crimes = cursor.fetchall()
districts_gun_violent_crimes_df = pd.DataFrame(districts_gun_violent_crimes, columns=['dist_num','gun_crimes'])
districts_gun_violent_crimes_df['dist_num'] = districts_gun_violent_crimes_df['dist_num'].astype(str)
districts_gun_violent_crimes_df
districts_gun_violent_crimes_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
districts_gun_violent_crimes_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_gun_violent_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
)
Now, lets create a dataframe of the different types of gun crimes for every district and then plot it on Choropleth map
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
gun='%GUN%'
police_stations = cursor.fetchall()
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1])
cursor.execute("""SELECT DESCRIPTION, count(DESCRIPTION) from crimes where district=%s and DESCRIPTION::text LIKE %s GROUP BY DESCRIPTION""",[police_station[2],gun])
district_gun_violent_crimes=cursor.fetchall()
district_gun_violent_crimes_df=pd.DataFrame(district_gun_violent_crimes, columns=['Description', 'Number of Gun Crime'])
header = district_gun_violent_crimes_df.to_html(classes='table table-striped table-hover table-condensed table-responsive')
folium.Marker(location=police_station_location,popup=folium.Popup(html="District No: %s GUN_Crime: %s" %(police_station[2],header) )).add_to(districts_gun_violent_crimes_map)
districts_gun_violent_crimes_map
district=[]
tarea=[]
with open('Boundaries.geojson') as f:
data = json.load(f)
a = data['features']
for i in range(len(a)):
obj=a[i]['geometry']
n= a[i]['properties']
district.append(n['dist_num'])
tarea.append(area(obj)/10000)
af=pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
af['dist_num'] = af['dist_num'].astype(str)
final_data= pd.merge(af, crimes_per_district, on='dist_num', how='inner')
final_data['crime_density'] = round(final_data['number_of_crimes']/(final_data['district_area_inHectares']/100))
final_data
gun_crime_arrests_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
gun_crime_arrests_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_gun_violent_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
gun='%GUN%'
police_stations = cursor.fetchall()
marker_cluster = MarkerCluster().add_to(gun_crime_arrests_map)
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1])
cursor.execute("""SELECT DISTINCT ON(caseno) caseno, block,DESCRIPTION, count(arrest), arrest,latitude, longitude from crimes where district=%s and DESCRIPTION::text LIKE %s GROUP BY caseno,block, DESCRIPTION,arrest, latitude, longitude""",[police_station[2],gun])
crimes_per_district = cursor.fetchall()
for crime in crimes_per_district:
if crime[4]==True:
folium.Marker(location=(crime[5],crime[6]),popup=folium.Popup(html="District No: %s <br> Description: %s <br> Block: %s" %(police_station[2],crime[2],crime[1])),icon=folium.Icon(color='green', icon='ok-sign'),).add_to(marker_cluster)
else:
folium.Marker(location=(crime[5],crime[6]),popup=folium.Popup(html="District No: %s <br> Description: %s<br> Block: %s" %(police_station[2],crime[2],crime[1])),icon=folium.Icon(color='red', icon='remove-sign'),).add_to(marker_cluster)
crimes_per_district[:3]
gun_crime_arrests_map
farthest_block_gun_crime_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
farthest_block_gun_crime_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_gun_violent_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
gun='%GUN%'
for police_station in police_stations:
cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block, A.where_is,ST_Distance(A.where_is,B.where_is) from crimes as A, police_stations as B
where ST_Distance(A.where_is,B.where_is) in ( SELECT max(dist) FROM
(SELECT ST_Distance(A.where_is,B.where_is) as dist from crimes as A, police_stations as B where A.district=%s
and DESCRIPTION::text LIKE %s and B.district= %s ) as f)""",[police_station[2],gun,police_station[2]])
farthest_block_gun_crime = cursor.fetchall()
cursor.execute("SELECT ST_X(ST_AsText(%s)), ST_Y(ST_AsText(%s))",(farthest_block_gun_crime[0][2],farthest_block_gun_crime[0][2]))
farthest_block_gun_crime_location = cursor.fetchall()
folium.Marker(location=(police_station[0],police_station[1]),popup=folium.Popup(html="Police Station <br> District No.:%s <br> Farthest Gun_Crime Block:%s"%(farthest_block_gun_crime[0][0],farthest_block_gun_crime[0][1]))).add_to(farthest_block_gun_crime_map)
folium.CircleMarker(farthest_block_gun_crime_location[0],radius=5,color='#ff3187',popup=folium.Popup(html="District No.:%s <br> Block:%s"%(farthest_block_gun_crime[0][0],farthest_block_gun_crime[0][1]))).add_to(farthest_block_gun_crime_map)
farthest_block_gun_crime_map
The PDF document your are submitting must have the source code and the output for the following requirements
block_max_gun_crime_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
block_max_gun_crime_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_gun_violent_crimes,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="Blocks with highest gun crimes"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
gun='%GUN%'
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1]) # Police station table was created much earlier; we repeatedly use them; position [2] has distr
query = f'''
SELECT block, count(block)
FROM crimes
WHERE crimes.DESCRIPTION ::text LIKE '{gun}' and crimes.district= {police_station[2]}
GROUP BY block HAVING count(block) >= ALL(SELECT count(block) FROM crimes
WHERE crimes.DESCRIPTION::text LIKE '{gun}' and crimes.district = {police_station[2]} GROUP BY block)'''
cursor.execute(query)
data=cursor.fetchall()
#print(data)
block_max_gun_crimes=pd.DataFrame(data, columns=['Block', 'Max Gun Crimes'])
header = block_max_gun_crimes.to_html(classes='table table-striped table-hover table-condensed table-responsive')
folium.Marker(location=police_station_location,popup=folium.Popup(html="District No: %s GUN_Crime: %s" %(police_station[2],header) )).add_to(block_max_gun_crime_map)
block_max_gun_crime_map
gun='%GUN%'
cursor.execute("SELECT district, count(district) from crimes where DESCRIPTION::text LIKE %s GROUP BY district",[gun])
districts_gun_violent_crimes = cursor.fetchall()
districts_gun_violent_crimes_df = pd.DataFrame(districts_gun_violent_crimes, columns=['dist_num','gun_crimes'])
districts_gun_violent_crimes_df['dist_num'] = districts_gun_violent_crimes_df['dist_num'].astype(str)
districts_gun_violent_crimes_df
district=[]
tarea=[]
with open('Boundaries.geojson') as f:
data = json.load(f)
a = data['features']
for i in range(len(a)):
obj=a[i]['geometry']
n= a[i]['properties']
district.append(n['dist_num'])
tarea.append(area(obj)/10000)
af=pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
af['dist_num'] = af['dist_num'].astype(str)
final_data= pd.merge(af, districts_gun_violent_crimes_df, on='dist_num', how='inner')
final_data['gun_crime_density'] = round(final_data['gun_crimes']/(final_data['district_area_inHectares']/100))
final_data
UPH='%UNLAWFUL POSS OF HANDGUN%'
cursor.execute("SELECT district, count(district) from crimes where DESCRIPTION::text LIKE %s GROUP BY district",[UPH])
districts_UPH_crimes = cursor.fetchall()
districts_UPH_crimes_df = pd.DataFrame(districts_UPH_crimes, columns=['dist_num','UPH_crimes'])
districts_UPH_crimes_df['dist_num'] = districts_UPH_crimes_df['dist_num'].astype(str)
districts_UPH_crimes_df
farthest_block_UPH_crime_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
farthest_block_UPH_crime_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_UPH_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'UPH_crimes'],
legend_name="Unlawful Possession of Handgun"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
UPH='%UNLAWFUL POSS OF HANDGUN%'
for police_station in police_stations:
cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block, A.where_is,ST_Distance(A.where_is,B.where_is) from crimes as A, police_stations as B
where ST_Distance(A.where_is,B.where_is) in ( SELECT max(dist) FROM
(SELECT ST_Distance(A.where_is,B.where_is) as dist from crimes as A, police_stations as B where A.district=%s
and DESCRIPTION::text LIKE %s and B.district= %s ) as f)""",[police_station[2],UPH,police_station[2]])
farthest_block_UPH_crime = cursor.fetchall()
if farthest_block_UPH_crime != []:
cursor.execute("SELECT ST_X(ST_AsText(%s)), ST_Y(ST_AsText(%s))",(farthest_block_UPH_crime[0][2],farthest_block_UPH_crime[0][2]))
farthest_block_UPH_crime_location = cursor.fetchall()
folium.Marker(location=(police_station[0],police_station[1]),popup=folium.Popup(html="Police Station <br> District No.:%s <br> Farthest UPH crime Block:%s"%(farthest_block_UPH_crime[0][0],farthest_block_UPH_crime[0][1]))).add_to(farthest_block_UPH_crime_map)
folium.CircleMarker(farthest_block_UPH_crime_location[0],radius=5,color='#ff3187',popup=folium.Popup(html="District No.:%s <br> Block:%s"%(farthest_block_UPH_crime[0][0],farthest_block_UPH_crime[0][1]))).add_to(farthest_block_UPH_crime_map)
else:
pass
farthest_block_UPH_crime_map
gun_crime_arrests_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
gun_crime_arrests_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
data = districts_gun_violent_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
gun='%GUN%'
police_stations = cursor.fetchall()
marker_cluster = MarkerCluster().add_to(gun_crime_arrests_map)
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1])
cursor.execute("""SELECT DISTINCT ON(caseno) caseno, block,DESCRIPTION, count(location_description), location_description,latitude, longitude from crimes where district=%s and DESCRIPTION::text LIKE %s GROUP BY caseno,block, DESCRIPTION,location_description, latitude, longitude""",[police_station[2],gun])
crimes_per_district = cursor.fetchall()
for crime in crimes_per_district:
if crime[4]=="RESIDENCE":
folium.Marker(location=(crime[5],crime[6]),popup=folium.Popup(html="District No: %s <br> Description: %s <br> Block: %s" %(police_station[2],crime[2],crime[1])),icon=folium.Icon(color='green', icon='ok-sign'),).add_to(marker_cluster)
elif crime[4]=="STREET":
folium.Marker(location=(crime[5],crime[6]),popup=folium.Popup(html="District No: %s <br> Description: %s<br> Block: %s" %(police_station[2],crime[2],crime[1])),icon=folium.Icon(color='red', icon='remove-sign'),).add_to(marker_cluster)
gun_crime_arrests_map